Generic rank-one corrections for value iteration in Markovian decision problems

نویسنده

Dimitri P. Bertsekas

چکیده

Given a linear iteration of the form x := F (x), we consider modified versions of the form x := F (x + γd), where d is a fixed direction, and γ is chosen to minimize the norm of the residual ‖x + γd − F (x + γd)‖. We propose ways to choose d so that the convergence rate of the modified iteration is governed by the subdominant eigenvalue of the original. In the special case where F relates to a Markovian decision problem, we obtain a new extrapolation method for value iteration. In particular, our method accelerates the Gauss-Seidel version of the value iteration method for discounted problems in the same way that MacQueen’s error bounds accelerate the standard version. Furthermore, our method applies equally well to Markov Renewal and undiscounted problems. 1 Research supported by NSF under Grant CCR-9103804. Thanks are due to David Castanon for stimulating discussions. 2 Department of Electrical Engineering and Computer Science, M.I.T., Cambridge, Mass., 02139. 1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acceleration Operators in the Value Iteration Algorithms for Average Reward Markov Decision Processes

One of the most widely used methods for solving average cost MDP problems is the value iteration method. This method, however, is often computationally impractical and restricted in size of solvable MDP problems. We propose acceleration operators that improve the performance of the value iteration for average reward MDP models. These operators are based on two important properties of Markovian ...

متن کامل

A New Value Iteration Method for the Average Cost Dynamic Programming Problem∗

We propose a new value iteration method for the classical average cost Markovian decision problem, under the assumption that all stationary policies are unichain and that, furthermore, there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem. Contrary to the st...

متن کامل

Affine Monotonic and Risk-Sensitive Models in Dynamic Programming

In this paper we consider a broad class of infinite horizon discrete-time optimal control models that involve a nonnegative cost function and an affine mapping in their dynamic programming equation. They include as special cases classical models such as stochastic undiscounted nonnegative cost problems, stochastic multiplicative cost problems, and risk-sensitive problems with exponential cost. ...

متن کامل

LIDS REPORT 2871 1 Q - Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems ∗

We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in [BY10b]. The main difference from the s...

متن کامل

Application of variational iteration method for solving singular two point boundary value problems

In this paper, He's highly prolic variational iteration method is applied ef-fectively for showing the existence, uniqueness and solving a class of singularsecond order two point boundary value problems. The process of nding solu-tion involves generation of a sequence of appropriate and approximate iterativesolution function equally likely to converge to the exact solution of the givenproblem w...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Oper. Res. Lett.

دوره 17 شماره

صفحات -

تاریخ انتشار 1995

Generic rank-one corrections for value iteration in Markovian decision problems

نویسنده

چکیده

منابع مشابه

Acceleration Operators in the Value Iteration Algorithms for Average Reward Markov Decision Processes

A New Value Iteration Method for the Average Cost Dynamic Programming Problem∗

Affine Monotonic and Risk-Sensitive Models in Dynamic Programming

LIDS REPORT 2871 1 Q - Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems ∗

Application of variational iteration method for solving singular two point boundary value problems

عنوان ژورنال:

اشتراک گذاری